energy <- read_csv(here("data", "energy.csv"))
## Parsed with column specification:
## cols(
## month = col_character(),
## res_total = col_double(),
## ind_total = col_double()
## )
energy_ts <- energy %>%
mutate(date = tsibble::yearmonth(month)) %>% #creates new column with date in time series class
as_tsibble(key = NULL, index = date) #converts to tsibble with date column as the time index
ggplot(data = energy_ts, aes(x = date, y = res_total)) +
geom_line() +
labs(y = "Residential energy consumption \n (Trillion BTU)")
- Is there an overall trend? > Increasing trend overall, but stability (and possibly a slight decreasing trend) starting around 2005
Is there seasonality? > Clear seasonality, with a dominant seasonal feature and also a secondary peak each year - that secondary peak has increased substantially
Any cyclicality evident?
Any other notable patterns, outliers, etc.? > No notable cyclicality or outliers
energy_ts %>%
gg_season(y = res_total) + #feasts::gg_season()
theme_minimal() +
labs(x = "month",
y = "residential energy consumption (trillion BTU)")
## Warning in NextMethod("["): Incompatible methods (">=.Date", ">=.vctrs_vctr")
## for ">="
## Warning in NextMethod("["): Incompatible methods ("<=.Date", "<=.vctrs_vctr")
## for "<="
## Warning in NextMethod("["): Incompatible methods (">=.Date", ">=.vctrs_vctr")
## for ">="
## Warning in NextMethod("["): Incompatible methods ("<=.Date", "<=.vctrs_vctr")
## for "<="
Major takeaways from this seasonplot:
energy_ts %>% gg_subseries(res_total)
Our takeaway here is similar: there is clear seasonality (higher values in winter months), with an increasingly evident second peak in June/July/August. This reinforces our takeaways from the raw data and seasonplots.
STL is a versatile and robust method for decomposing time series. Acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships.
STL allows seasonality to vary over time (a major difference from classical decomposition), and important here since we do see changes in seasonality.
# Find STL decomposition
dcmp <- energy_ts %>%
model(STL(res_total ~ season()))
#View the components
#components(dcmp)
#Visualize the decomposed components
components(dcmp) %>% autoplot() +
theme_minimal()
We use the ACF to explore autocorrelation (here, we would expect seasonality to be clear from the ACF):
energy_ts %>%
ACF(res_total) %>%
autoplot()
We see that observations separated by 12 months are the most highly correlated, reflecting strong seasonality we see in all of our other exploratory visualizations.
Here we are using ETS, which technically uses different optimization than Holt-Winters exponential smoothing, but is otherwise the same.
To create the model below, we specify the model type(exponential smoothing, ETS), then tell it what type of seasonality it should assume using the season("") expression, where “N” = non-seasonal (try changing it to this to see how unimpressive the forecast becomes!), “A” = additive, “M” = multiplicative. Here, we’ll say seasonality is multiplicative due to the change in variance over time and also within the secondary summer peak:
#Create the model:
energy_fit <- energy_ts %>%
model(
ets = ETS(res_total ~ season("M"))
)
#Forecast using the model 10 years into the future:
energy_forecast <- energy_fit %>%
forecast(h = "10 years")
#Plot just the forecasted values (with 80 & 95% CIs):
energy_forecast %>%
autoplot()
#Or plot it added to the original data:
energy_forecast %>%
autoplot(energy_ts)
We can use broom::augment() to append our original tsibble with what the model predicts the energy usage would be based on the model. Let’s do a little exploring through visualization.
First, use broom::augment() to get the predicted values and residuals:
#Append the predicted values (and residuals) to original energy data
energy_predicted <- broom::augment(energy_fit)
#Use View(energy_predicted) to see the resulting dataframe
Now, plot the actual energy values (res_total), and the predicted values (stored as .fitted) atop them:
ggplot(data = energy_predicted) +
geom_line(aes(x = date, y = res_total)) +
geom_line(aes(x = date, y = .fitted), color = "red")
These look like pretty good predictions!
Now let’s explore the residuals. Remember, some important considerations: Residuals should be uncorrelated, centered at 0, and ideally normally distributed. One way we can check the distribution is with a histogram:
ggplot(data = energy_predicted, aes(x = .resid)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We see that this looks relatively normally distributed, and centered at 0 (we could find summary statistics beyond this to further explore).
This is the END of what you are expected to complete for Part 1. on time series exploration and forecasting. The section below shows how to use other forecasting models (seasonal naive and autoregressive integrated moving average, the latter of which was not covered in lecture).
# Fit 3 different forecasting models (ETS, ARIME, SNAIVE):
energy_fit_multi <- energy_ts %>%
model(
ets = ETS(res_total ~ season("M")),
arima = ARIMA(res_total),
snaive = SNAIVE(res_total)
)
#Forecast 3 years into the future (from data end date):
multi_forecast <- energy_fit_multi %>%
forecast(h = "3 years")
#Plot the 3 forecasts:
multi_forecast %>%
autoplot(energy_ts)
#Or just view the forecasts (note the similarity across models):
multi_forecast %>%
autoplot()
We can see that all three of these models (exponential smoothing, seasonal naive, and ARIMA) yield similar forecasting results.